GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics
نویسندگان
چکیده
Large scale graph processing is a major research area for Big Data exploration. Vertex centric programming models like Pregel are gaining traction due to their simple abstraction that allows for scalable execution on distributed systems naturally. However, there are limitations to this approach which cause vertex centric algorithms to under-perform due to poor compute to communication overhead ratio and slow convergence of iterative superstep. In this paper we introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters. We introduce a sub-graph centric programming abstraction that combines the scalability of a vertex centric approach with the flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation.
منابع مشابه
GoFFish: A Framework for Distributed Analytics over Timeseries Graphs
Massive datasets from scientific instruments and enterprises were the initial Big Data frontiers. But these are being subsumed by complex, high-velocity data from ubiquitous sensors and social network streams. Such datasets are characterized by both temporal attributes and lateral relationships between them forming a graph structure, and scalable data analytics frameworks have not been adequate...
متن کاملScalable Analytics over Distributed Time-series Graphs using GoFFish
Graphs are a key form of Big Data, and performing scalable analytics over them is invaluable to many domains. As our ability to collect data grows, there is an emerging class of inter-connected data which accumulates or varies over time, and on which novel analytics – both over the network structure and across the time-variant attribute values – is necessary. We introduce the notion of time-ser...
متن کاملSubgraph Rank: PageRank for Subgraph-Centric Distributed Graph Processing
The growth of Big Data has seen the increasing prevalence of interconnected graph datasets that reflect the variety and complexity of emerging data sources. Recent distributed graph processing platforms offer vertex-centric and subgraphcentric abstractions to compose and execute graph analytics on commodity clusters and Clouds. Näıve translation of existing graph algorithms to these programming...
متن کاملNScale: Neighborhood-centric Analytics on Large Graphs
There is an increasing interest in executing rich and complex analysis tasks over large-scale graphs, many of which require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph. Examples of such tasks include ego network analysis, motif counting in biological networks, finding social circles, personalized recommendations, link prediction, anomaly de...
متن کاملSPARTex: A Vertex-Centric Framework for RDF Data Analytics
A growing number of applications require combining SPARQL queries with generic graph search on RDF data. However, the lack of procedural capabilities in SPARQL makes it inappropriate for graph analytics. Moreover, RDF engines focus on SPARQL query evaluation whereas graph management frameworks perform only generic graph computations. In this work, we bridge the gap by introducing SPARTex, an RD...
متن کامل